6

America) with this computer program, which only writes letters as optimally as possible

among each other (hence sequence comparison or alignment). The decisive prerequisite

for this is that one knows and understands the results correctly in their biological mean­

ing - and this is precisely the work of the bioinformatician.

Understanding Data

­

­

­

­

1.2

2008

2008

­

What would you actually have to pay special attention to if, for example, you now per­

form such sequence comparisons yourself? It is important to know that the BLAST search

is not completely accurate (heuristic), but it delivers faster results than a 1:1 comparison

over the entire sequence length against the database. Therefore, such hits are only credible

if the probability of getting such a hit by chance is low enough. As a first rule of thumb you

can remember: The E-Value (i.e. the expected value of a random hit) should be less than

1 in one million. This is then already a very convincing value. In borderline cases (random

expectation value at 1 in 1000), you can also take the hit sequence and see if you can find

the initial sequence again (called “reverse search” in technical jargon). If we keep in mind

that this is a local search, then we also understand why we should search the whole hit

length (given in the example, sequence similarity over the whole sequence length). But

there are also BLAST results where only one subsequence in the protein has high similar­

ity and the rest instead shows no similarity. In this case, the BLAST search turned up only

one protein domain, the one with the highest similarity in the whole database. To deter­

mine the remaining parts of the sequence in terms of function as well, you then need to use

only those domains that do not yet have database hits again, without the first sequence part

1  Sequence Analysis: Deciphering the Language of Life